Western AI
DATE
October 2025 - Present
DESCRIPTION
Built a plagiarism detection pipeline using Python that compares pairs of source code files using token overlap, program output similarity, and embedding-based semantic similarity, fed through a MLP, classifying the pairs as plagiarized or not plagiarized. Cleaned and preprocessed code submissions with pandas and NumPy on a labeled dataset with 13,500 C++ code pairs.
TECHNOLOGIES
Pandas
Numpy
HuggingFace
Python